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ABSTRACT 

We have investigated some statistical properties of integrated spectra of galaxies from 
Kennicutt (1992a) spectrophotometric atlas. The input for the analysis are galaxy spec- 
tra sampled in 1300 bins between 3750 A and 6500 A. We make use of Principal Com- 
ponent Analysis (PCA) to analyse the 1300-dimensional space spanned by the spectra. 
Their projection onto the plane defined by the first two principal components, the 
principal plane, shows that normal galaxies are in a quasi-linear sequence that we call 
spectral sequence. We show that the spectral sequence is closely related to the Hubble 
morphological sequence. These results are robust in the sense that the reality of the 
spectral sequence does not depend on data normalization. The existence of this se- 
quence suggests that a single parameter may describe the spectrum of normal galaxies. 
We have investigated this hypothesis with Bruzual & Chariot (1995) models of spectral 
evolution. We show that, for single age models (15 Gyr), the spectral sequence can be 
parametrized by the characteristic star formation time-scales of the different morpho- 
logical types. By examining the projection of evolutionary tracks of normal galaxies 
onto the principal plane, we verify that the spectral sequence is also an evolutive se- 
quence, with galaxy spectra evolving from later to earlier spectral types. Considering 
the close correspondence between the spectral and morphological sequences, this lead 
us to speculate that galaxies may evolve morphologically along the Hubble sequence, 
from Sm/Im to E. 
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1 INTRODUCTION 

Normal galaxies tend to present great morphological reg- 
ularities, which allow us to classify them along the Hub- 
ble sequence (Hubble 1926, Sandage 1961). The integrated 
spectra of nearby galaxies also show remarkable regulari- 
ties. Their general properties have been recently discussed 
by Kennicutt (1992a,b), who has shown that normal galaxies 
have spectra that progress smoothly with the morphological 
type. Since the integrated spectrum of a galaxy represents 
a weighted mean in luminosity of the stellar populations 
that make it up, these results indicate that galaxies of same 
morphological type tend to have similar stellar populations, 
what provides the basis for a spectral classification of galax- 
ies (Humason 1936, Morgan & Mayall 1957). 

We present here a study of Kennicutt's (1992a) spec- 
trophotometric atlas, looking for global regularities in the 
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integrated spectra of galaxies. This is done with a standard 
statistical technique, Principal Component Analysis (PCA). 
PCA has previously been applied to the analysis and objec- 
tive classification of spectra. Francis et al. (1992) have car- 
ried out a study of a large sample of QSO spectra, developing 
a quantitative classification scheme based on the first three 
principal components. Sodre & Cuevas (1994) have analysed 
a set of 24 normal galaxies from Kennicutt's atlas, conclud- 
ing that most of the variance present in their integrated op- 
tical spectra is due to their morphological differences. They 
have shown that these spectra may be parametrized by the 
morphological type, what allows a quantitative galaxy clas- 
sification based on integrated spectra. Connolly et al. (1995) 
have studied the continuum spectra of the central regions of 
ten galaxies, covering a wavelength range from 1200A to 
1/im, also finding that ordinary galaxy spectra can be de- 
scribed by a one-parameter family. Folkes, Lahav & Maddox 
(1996) have used a combination of PCA and artificial neu- 
ral networks for automated classification of galaxies from 
low signal-to-noise spectra. 
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Here we describe each spectrum by a point in a high- 
dimensional space where each dirnonsion is the flux at each 
wavelength. We then apply PCA to obtain a suitable pro- 
jection of the spectra onto a plane. This is done by iden- 
tifying the orthogonal combinations of the variables with 
mciximum variance, the 'principal components'. Such a pro- 
jection provides a synthetic view of the data space, allowing 
to investigate correlations between the spectra. 

The plan of this paper is as follows. In section 2 we 
describe our input data, taken from Kcmiicutt's atlas, and 
a data subset containing only normal galaxies. Some aspects 
of principal component analysis that are useful for the study 
presented here arc reviewed in section 3. In this section we 
also discuss data normalization and scaling, since it is well 
known that it affects the output of PCA. The results of the 
application of PCA to all galaxies, as well as to the subset 
of normal galaxies, are presented in section 4 and discussed 
in section 5. Finally, section 6 summarises our conclusions. 



2 THE DATA 

The 55 integrated spectra investigated here are from Kenni- 
cutt's (1992a) spectrophotometric atlas of galaxies. Most of 
the observations cover the wavelength range between 3650 
A and 7100 A, with 5 - 8 A resolution. To avoid incomplete- 
ness in the spectral coverage, we analyse here the rest-frame 
wavelength interval 3750 - 6500 A. This spectral range was 
then uniformly re-sampled with 1300 bins, in order to ap- 
proximately preserve the original bin width. The spectra in 
the atlas are normalized to unity at 5500 A, implying that 
we are not taking into account any dependence with galaxy 
luminosity. We discuss below other normalizations and their 
effect on the results. 

We have analysed two sets of data, one containing 
all galaxies {AQS), and a subset of AQS with normal 
galaxies only [NQS). The sample of normal galaxies con- 
tains 23 objects and covers the Hubble sequence from 
E to Im, avoiding objects with any evidence of pecu- 
liarity (e.g. AGNs, starbursts, mergers). The galaxies in 
this set are: NGC3379 (EO), NGC4472 (El/SO), NGC4648 
(E3), NGC4889 (E4), NGC3245 (SO), NGC3941 (SBO/a), 
NGC4262 (SBO), NGC5866 (SO), NGC1357 (Sa), NGC2775 
(Sa), NGC3368 (Sab), NGC3623 (Sa), NGC1832 (SBb), 
NGG3147 (Sb), NGC3627 (Sb), NGC4775 (Sc), NGC5248 
(Sbc), NGC6217 (SBbc), NGC2903 (Sc), NGC4631 (Sc), 
NGC6181 (Sc), NGC6643 (Sc), and NGC4449 (Sm/Im). Al- 
though small, the NQS allows us to explore the connection 
between normal galaxy spectra and morphology. 



3 PRINCIPAL COMPONENT ANALYSIS 
3.1 General Principles 

Suppose we have a sample of N integrated spectra of galax- 
ies, all covering the same rest-frame wavelength range. Each 
spectrum is described by a M-dimensional vector X contain- 
ing the galaxy flux at M uniformly sampled wavelengths. Let 



S be the M-dimcnsional space spanned by the 'spectral' vec- 
tors X. An integrated spectrum, then, is a point in <S-space, 
and the spectra in the sample form a cloud of points in <S. If 
the spectral vectors of normal galaxies are indeed correlated 
with their morphology, one would expect that these points 
would be arranged more or less along a line, mimicking the 
Hubble morphological sequence. It is impossible, of course, 
to visualize how the data is distributed in high-dimensional 
spaces, like the 1300-dimensional spectral space discussed 
in next section. An alternative is to employ some technique 
for dimension reduction by projecting the data in, say, two 
dimensions. 

Here we analyse the data with a standard technique. 
Principal Component Analysis (PCA). It is also known as 
Hotelling transform, or discrete Karhunen-Loeve transform, 
but we call it PCA because this designation is more often 
used in Astronomy (e.g. Murtagh & Heck 1987 and refer- 
ences therein). 

PCA is an orthogonal transformation that allows build- 
ing more compact linear combinations of the data that 
are optimal with respect to the mean square error crite- 
rion. A detailed description of PCA can be found in several 
books on statistics or pattern recognition (e.g. Kendall 1975, 
Fukunaga 1990), as well as in the astronomical literature 
(e.g. Brosche 1973, Whitney 1983, Efstathiou & FaU 1984, 
Lahav et al. 1996). Here we restrict ourselves to a short out- 
line of the procedure, emphasizing only the aspects that are 
relevant for our analysis. 

Consider a set of A'' objects (galaxies) each with M fea- 
tures (flux at M given wavelengths). Let Xij be the spectrum 
of the i-th object, i.e., it is the flux (or a scaled version of it, 
see below) at the j— th wavelength. It is often more useful to 
do the analysis with a pre-processed version of the data. For 
example, it is convenient to subtract the sample mean from 
each spectrum. This is equivalent to put the origin of the 
coordinate system at the baryccntre of the data in the spec- 
tral space S. It may be interesting, additionally, to re-scale 
each variable to unit variance. The first case is analogous to 
PCA of the covariance matrix, while the second is the same 
as PCA of the correlation matrix. 

Let us then assume that the spectral vectors X have 
zero mean. The covariance matrix of the data in this case is 

1 ^ 

Cj-fc = ^ _^ XijXik (1) 

Now, let us consider a new vector, Y, which is a transformed 
version of X, given by 

Y = AX (2) 

where A is a M x M matrix whose rows are the eigenvectors 
of the covariance matrix C. 

This transform has several interesting properties. The 
covariance matrix of Y is diagonal, with elements equal to 
the eigenvalues Xk of A. This means that the transformed 
vector components yu are uncorrelated. Additionally, each 
eigenvalue Afc is equal to the variance of the fe-th element 
of Y. Since A is a real and symmetric matrix, its inverse is 
equal to the transpose, A"""^ = A'. Then, it follows that X 
can be reconstructed from Y by using the relation 
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X = A'Y 



(3) 



Suppose, however, that instead of using all the M compo- 
nents of Y, we form a new matrix, Ak, from the K eigen- 
vectors corresponding to the K largest eigenvalues. The Y 
vectors will then be if -dimensional and the reconstruction 
will no longer be exact. Let 



X = AicY 



(4) 



represent the approximation of X obtained with the trans- 
formation matrix Ak. It can be shown that the mean square 
error between X and X is given by 



A, 



(5) 



j=K+l 



This equation indicates that the reconstruction is exact if 
K = M. Also, for a given K, the error is minimized by se- 
lecting the eigenvectors associated with the K largest eigen- 
values. Thus, the PCA transform is optimal in the mean 
square error sense. 

Note that, contrarily to most implementations of PCA, 
here we do not re-normalize each transformed component by 
its corresponding A^''^, since this gives unit variance along 
the new axis, changing the metric of the transformed space. 

The PCA transform has an important difference when 
compared to other orthogonal transforms: the basis vectors 
in ordinary orthogonal transforms (e.g. Fourier) are fixed, 
while in PCA they are the eigenvectors of the covariance 
matrix and, hence, they are data dependent. 

The main body of this paper contains the analysis and 
discussion of the projected distribution of the spectra onto 
the plane (j/i, j/2) defined by the two first components of Y. 
We call it the principal plane. The first principal component 
is taken to be along the direction in the M-dimcnsional spec- 
tral space with the maximum variance. The second principal 
component is constrained to lie in the subspace perpendic- 
ular to the first and, within that subspace, it is also taken 
along the direction with the maximum variance. Then, the 
principal plane is the plane that contains the maximum vari- 
ance in the spectral space. In this sense it is the most infor- 
mative plane contained in the data space. 

3.2 Data Scaling 

It is well known that the set of orthonormal coordinates re- 
sulting from an application of PCA is affected by the scaling 
of the data, and then the projection of the spectra onto the 
plane defined by the first two components is different for 
distinct data normalizations. The problem, here, is that the 
scaling may affect differently lines and continuum. 

Consider, first, the normalization of the spectra. Re- 
sults are different if the spectra are set to unit at 55OOA, 
/ssoo = 1, as they appear in Kennicutt's atlas, or if they 
are normalized to have the same mean flux within the wave- 
length range of interest, like in the analysis of Francis et 
al. (1992). Connolly et al. (1995) normalize each spectrum 
to unit norm, /| = 1, and notice that the results are 
similax to the latter case above. The coefficients of the ex- 
pansion, in this case, are direction cosines (e.g. Whitmore 



Table 1. Percentage of the variance explained by the first two 
principal components 

method sample cumulative variance (%) 



covariance MQS 

Ags 

correlation MQS 

Ags 



94.6 
93.1 

86.5 
85.0 



1984), and the spectra arc distributed now onto the surface 
of a hypersphere of unit radius in the spectral space. Here we 
consider only the normalization of the spectra to the same 
mean flux, ^2i\ ~ ^® have repeated the analysis with 
the other normalizations, verifying that they do not affect 
our main results. 

As discussed in the previous section, the data is usu- 
ally pre-processed before the analysis. Flux values at each 
wavelength can be scaled to zero mean (which we call co- 
variance method) or to zero mean and unity variance (cor- 
relation method). The variance in the lines and continuum 
are equal in this last case. In order to control the effects due 
to scaling, we have applied both methods to the analysis 
of our data sets. It is worth pointing out, as we shall see, 
that our results are also invariant to the different data scal- 
ing, because the linear nature of the transform considered 
here preserves correlations between variables, independent 
of scaling (e.g. Francis et al. 1992). 



4 RESULTS 

We have applied PCA to a sample containing the spectra of 
all galaxies in Kennicutt's atlas (set AGS), as well as to a 
subsample comprising only normal galax;ies (set AfQS). The 
data was scaled before the analysis according to one of the 
procedures discussed above. 

We consider first the compression achieved by the two 
transformed components (yi,j/2). The results are summa- 
rized in Table 1, which gives the cumulative fraction of the 
variance explained by these components for the two data 
sets, and for different scalings. We notice that there are not 
significant differences between the two sets. With the covari- 
ance method, ~ 93 - 95% of the total variance is described 
by the two first terms of the expansion. For the analysis with 
the correlation matrix, the cumulative variance is lower, but 
at least ~ 85% of the data variance is contained in two com- 
ponents now. 



4.1 Analysis of the normal galaxies set {AfQS) 

Figure 1 shows the projection of the spectra of the AfQS onto 
their principal plane. The most remarkable aspect in this 
figure is that the projected spectra of almost all galaxies are 
arranged along a quasi-linear sequence which we shall call 
spectral sequence. There are some outliers of the sequence 
for the covariance method (figure la), corresponding mainly 
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Figure 1. Projection of the spectra of the MQS onto the princi- 
pal plane: (a) covariance method; (b) correlation method. Differ- 
ent symbols correspond to different morphological types: E (open 
circles), SO (filled circles), Sa-Sbc (open triangles), Sc-lm (filled 
triangles). Also shown in the figure arc the projections of the 
mean spectra of these 4 morphological groups (crosses): E, SO, 
Sa-Sbc and Sc-Im (from left to right). 



Figure 2. The first principal component, yi, versus T-types for 
the NQS and the covariance method. 



to late-type galaxies. For instance, the point with largest 
value of y\ in figure 1 corrcspoiKis to the Sm/Iin galaxy 
NGC4449. This is due, at least partially, to the 'nebular' 
nature of the spectrum of this galaxy (see section 4.3). Note 
that the Magellanic irregulars in the atlas arc completely 
dominated by young stars and HII regions, and their spectra 
may not be representative of this class (Kennicutt 1992a). 
The sequence width is relatively larger for the correlation 
method (figure lb), but even in this case it is clearly defined 
(the ratio between the standard deviations of and j/2 is 



(Ai/A. 



Nl/2 



3.7). 



Most of the variance in the NQS seems to be due to 
the morphological mix of the sample. We also show in fig- 
ure 1 the projection of the mean spectra of 4 morphological 
groups (E, SO, Sa-Sbc, Sc-Im) onto the principal plane as 
crosses. Clearly, these groups are disposed along the spec- 
tral sequence keeping a ranking analogous to the Hubble se- 
quence. It can be verified, however, that there is significant 
overlap of spectra of galaxies that are in distinct morpholog- 
ical groups. Indeed, an inspection of the spectra shows that 
there is a large scatter of spectral properties within each 
morphological group, and it is not difficult to find a bona 
fide galaxy of a given morphological type with a spectrum 
typical of galaxies of another type. Note that part of this 
discrepancy should be attributed also to the fuzzy nature 
of the morphological classification, as discussed by Nairn et 
al. (1995) and Lahav et al. (1995), who found a dispersion of 
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1.8 T- units in the classification of a set of 830 APM gala^xies 
by 6 experts. 

In figure 2 we plot the first principal component j/i 
versus the Hubble type, measured in the T-type system of 
RC3 (dc Vaucoulours ct al. 1991), for the covariance method 
(results are similar for the correlation method). This plot 
presents a clear (non-linear) correlation between yi and T- 
type. In fact, Spearman's rank-order correlation coefficient 
between T-types and yi is high, 0.93 and 0.90 for the covari- 
ance and correlation methods, respectively, confirming that 
the correlation between these two quantities is significant. 
The same analysis done with the second principal compo- 
nent fails to show any significant correlation between T and 
j/2. We conclude, then, that the spectral sequence correlates 
strongly with the Hubble morphological sequence. This result 
is relevant for galaxy classification since it allows to ascribe 
a type to a normal galaxy from its spectrum alone (e.g. by 
its value of yi or by its position along the spectral sequence) . 

Another point of interest contained in figure 2 is that 
the spectral variation, as measured by yi, is slow from E 
to Sab, increasing quickly for later types. This result also is 
independent of the data scaling, and reveals that the spec- 
tral distance between E (T=-5) and Sab (T=2) galaxies is 
smaller than between Sb (T=3) and Sc (r=5). Kennicutt 
(1992a) arrived at the same conclusion, arguing that un- 
less one were able to measure the blue continuum colours or 
[OII]A3727 emission line to high accuracy, it would be dif- 
ficult to distinguish the spectrum of a Sa-Sb from an E-SO 
gala^xy. Figure 2 also shows that the variance of j/i increases 
with T, indicating that late-type galax:ies present a richer 
spectral variety than early-types, at least for the sample 
studied here. 



4.2 Analysis of all galaxies (AQS) 

Figure 3 shows the projection of the AGS spectra onto their 
principal plane for the covariance method. This set includes 
normal galaxies, AGN, staxburst, and interacting or merging 
galaxies. Most of the variance, now, is due to two galaxies 
only: Mk59 and Mk71. They are Sm galaxies with spectra 
dominated by Hll-region emission, and here too the nebu- 
lar nature of their spectra as well as the faintness of their 
continuum seems to be the cause of their position in the fig- 
ure. The point to be stressed, however, is that the normal 
galaxies are again disposed along a sequence. This is best 
seen in figure 3b, where we have zoomed out part of figure 
3a. The projections resulting from the correlation method 
are shown in figure 4, and are qualitatively similar to those 
displayed in figure 3. The spectral sequence is clearly vis- 
ible, what confirms that it is an intrinsic property of the 
integrated spectra of normal galaxies. 

How are galaxies other than the normal ones distributed 
in the (j/i, j/2) plane? The answer now depends on the scaling 
and, for most types of peculiar galaxies, their location with 
respect to the normal galax;ies varies from plot to plot in fig- 
ures 3 and 4. An important exception are the starburst nu- 
clei NGC3471, NGC5996 and NGC7714, which consistently 
fall over the Sc-Im region of the spectral sequence. 
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Figure 3. Projection of the spectra of AQS onto the principal 
plane for the covariance method. The projected spectra of normal 
galaxies are represented here as filled hexagons, while those of 
peculiar galaxies are shown as open circles. The mean spectra 

arc represented by crosses (sec caption of figure 1). Figure 3b is 
similar to 3a, but zoomed out to enhance the spectral sequence. 
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Figure 4. Same as figure 3a, but for the correlation method. 



4.3 The nature of the principal components 

PCA is a linear transform and then each principal compo- 
nent is a linear combination of the flux at all wavelengths. 
We plot in figure 5 the weights associated with each wave- 
length for the two first principal components for galaxies in 
the NQS. This figure is useful because it provides an insight 
on what determines yi and 3/2 for each type of data scaling. 
Results for the AQS are similar. 

Figure 5a shows the weights corresponding to yi for the 
covariamce method. They are the components of the first 
eigenvector of the covariance matrix. The overall weight dis- 
tribution is regular, being positive in the blue and decreas- 
ing to negative values for longer wavelengths. Blue gala^xies, 
then, tend to have larger values of j/i than the red ones. But 
large weights are associated with wavelengths of emission 
lines, mainly [OIII]AA4959,5007 and H/3. This might lead 
one to suppose that j/i is actually measuring the strength 
of these lines. To clarify this point, we repeated the anal- 
ysis excluding small regions around the Balmer lines and 



the [OIII] nebular lines. The projection of the galaxies onto 
their principal plane looks like that shown in figure lb, with 
the two previous late-type outliers of figure la now disposed 
along the sequence. Note that most galaxies in the MQS 
present negligible or small emission at the wavelengths of 
the [OIII] lines and H/3, and then the information carried 
by these wavelengths is proportional to the continuum flux 
at ~ 5OOOA. Galaxies with strong nebular spectrum will 
have larger values of j/i because they tend to be bluer than 
the others and, mainly, because of the large and positive 
contribution of their line flux. We conclude that the infor- 
mation associated with the spectral sequence comes essen- 
tially from the continuum, with a significant contribution of 
emission lines, when present. This is confirmed by figure 5b, 
where it is shown the weights corresponding to j/2. These 
weights axe almost zero, except at the wavelengths close to 
the emission lines, where they present an oscillation. These 
oscillations act like a line detector as it computes a linear 
combination of differences between the flux at the line and 
the nearby continuum (adjacent continuum - line, actually): 
2/2 is near zero for spectra without emission lines, and neg- 
ative for those where the lines are prominent. This explains 
the position of the outliers in figure la. 

The covariance method preserves the differences be- 
tween the continuum and the lines and then our result re- 
veals that a large fraction of the variance of the sample is 
coming from wavelengths associated with H/3 and the [OIII] 
nebular lines. In the correlation method, lines and contin- 
uum are put to the same variance, and the resulting weights 
are quite different, as shown in figures 5c and 5d. The contin- 
uum now provides a contribution to j/i more significant than 
the emission lines. The distribution of the weights shows 
that yi is measuring the flux difference between the blue 
and red parts of a spectrum, with a non negligible contri- 
bution of some lines. The weights associated with 7/2 in the 
correlation method are shown in figure 5d. Their behaviour 
is quite different from that shown in figure 5b for the covari- 
ance method. Now j/2 is essentially probing the wavelength 
interval between 4300A and 5200A. Then, to first order, j/i 
is measuring a colour and j/2 the relative flux at ~4750A. 

The discussion in this section reveals that the detailed 
interpretation of the principal components are quite depen- 
dent of the data set. At the same time, it indicates that is 
the continuum that gives the major contribution to the first 
component, but nebular emission may also be important. 

4.4 Effect of the errors 

The overall accuracy of the spectrophotometry of the galax- 
ies in the atlas is ~5-10%. To investigate the effect of these 
errors we have adopted the following procedure. For two 
normal galaxies, NGC3379 (EO) and NGC4631 (Sc), we pro- 
duced 100 simulated spectra by adding Gaussian noise with 
standard deviation equal to 0.1 times the flux at each wave- 
length to the two observed spectra. For each data set and 
type of scaling, the simulated spectra were pre-processed and 
projected onto the principal plane. We have verified that the 
errors have more effect on the late-type galaxy spectrum 
than on the elliptical spectrum, due to the emission lines 
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Figure 5. Weights (components of eigenvectors) associated to the principal components yi and 2/2 for the MQS: (a) 2/1, covariance 
method; (b) 2/2, covariance method; (c) 2/1, correlation method; (d) 2/2, correlation method. 



present in the spectrum of the former object. Their effect 
on the results presented here are, however, negligible. 



5 DISCUSSION 

5.1 The spectral sequence 

The results of the previous section allow us to conclude that 
the spectral sequence is real in the sense that the spectra of 
normal galaxies form a sequence in the spectral space. This 
result is robust because, as we have seen, it is independent 
of data normalization and scaling. Note that figures la and 
3 represent projections of the same data space (for the co- 
variance method) onto different planes: while in figure la 
the plane maximizes the variance of the MQS, the principal 



plane in figure 3 maximizes the variance of the AQS. The 
same is true for figures lb and 4 for the correlation method. 

It is well known that several parameters related to 
spectra, like colours or spectral indices, correlate well with 
the Hubble sequence (e.g. Roberts & Haynes 1994). We 
had already noticed (Sodre & Cuevas 1994) that in a data 
space defined by spectral parameters, like the amplitude of 
the 4OOOA break and the strength of the G band or the 
Mg2 index, the morphological types are distributed along 
a one- dimensional sequence. Our results here indicate that 
these empirical correlations are due to the global regulari- 
ties present in galaxy spectra, illustrated here by the spec- 
tral sequence. It provides quantitative support for a one- 
dimensional description of the general properties of normal 
galaxies, like the Hubble morphological sequence. It also in- 
dicates that there is not a dividing line in spectral properties 
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between elliptical and spiral galaxies, and that lenticulars 
have spectra intermediate between those two morphological 
groups (c.f. figures 2-4). 

The existence of the spectral sequence and its relation 
with the morphological sequence has been recognized by 
Connolly et al. (1995). They applied a procedure similar 
to ours to a sample of 10 composite spectra of the cen- 
tral regions of normal and starburst galaxies, each covering 
a wavelength range of 1200A to 1/jm. Interestingly, they 
found that their 10 spectra were disposed along a spectral 
sequence, in disagreement with our results, where the spec- 
tral sequence is defined only by the normal galaxies. This is 
due to two effects. First, they have removed the strongest 
nebular emission lines from the starburst spectra, decreas- 
ing the variance associated with the lines (c.f. section 4.3). 
Second, a visual examination of their spectra indicates that 
most of the variance, now, is coming from the UV contin- 
uum (A <, 3000A), a region not included in our analysis. 
This discussion confirms how dependent of the input data 
the results of PCA are! Another point that is worth men- 
tioning is that Connolly et al. (1995) found a curved spectral 
sequence, while in our study it is more or less linear. This is 
a consequence of their normalization by the scalar product, 
that projects the spectra onto the surface of a hypersphere 
in the spectral space. 

Taken together, these results increase the robustness of 

the spectral sequence of normal galaxies, as it appears in 
all these analysis of galaxy spectra, irrespective of the wave- 
length interval, spectral resolution, or the detailed form of 
the input. Consequently, we expect that a study of a sample 
that includes, say. Ha, will lead to the same general conclu- 
sions although the projection of the galaxies onto the new 
principal plane probably will be different. 

The principal plane is convenient for classification of 
normal galaxies, because we can ascribe an objective spec- 
tral type to a normal galaxy from its position along the 
sequence. The spectra of peculiar galaxies, however, occupy 
positions in the principal plane that depend whether one 
uses the covariance or the correlation method, since the rel- 
ative role of continuum and lines is different for each of the 
two scaling methods. Hence, the principal plane is not ade- 
quate for classification of non- normal galaxies, i.e., one needs 
more than just two dimensions to describe the general man- 
ifold of integrated spectra of galaxies. 

An obvious advantage of spectral classification over 
morphological classification is that the former is more ad- 
equate to a quantitative approach than the latter. For in- 
stance, Folkes, Lahav & Maddox (1996) developed a method 
for galaxy classification from low signal-to-noise spectra typ- 
ical of reshift surveys. They have simulated spectra of nor- 
mal galaxies with the parameters of the 2dF Galaxy Redshift 
Survey. Using a combination of PCA and artificial neural 
networks, they show that it is necessary typically 8 princi- 
pal components to describe normal noisy spectra and be able 
to classify them. Their results indicate that ~ 95% of the 
spectra are correctly classified in 5 morphological groups at 
hj = 19.7, the limiting magnitude of the survey. Since they 
also used Kennicutt's data, their PCA results are very simi- 



lar to ours, even noting that their sample of normal galaxies 
contains 3 galaxies more than ours. 

5.2 Model spectra 

The principal plane provides us with a useful tool to analyse 
spectra. We illustrate this point with the following exercise. 
The very existence of the spectral sequence, and its cor- 
relation with the Hubble sequence, indicates that one sin- 
gle parameter may be responsible for the integrated spec- 
tra of normal galaxies- and of the morphological sequence. 
For instance, Gallagher, Hunter & Tutukov (1984), Sandage 
(1986) and Ferrini & Galli (1988), suggest that several prop- 
erties of the Hubble sequence can be explained by variations 
in the star formation rate of galaxies. 

We have investigated this hypothesis with Bruzual & 
Chariot (1995, hereafter B&C) revised models of spectral 
evolution of galaxies (see also Bruzual & Chariot 1993). We 
have only considered models with a single parameter, the 
characteristic star formation time-scale of a galaxy, r, where 
the SFR decreases with time as exp(— i/r). We assume a 
Salpeter IMF, with lower and upper mass limits equal to 0.1 
and 125 M©, respectively, and neglect gas recycling. Each 
model results in a spectrum sampled at 238 channels within 
the wavelength range of interest here. For consistency, we 
have computed the principal components of our two data 
sets using the same 238 channels of the theoretical spectra, 
instead of our 1300 previous wavelengths. The cumulative 
variance explained by the principal plane is not too affected 
by this reduction of spectral resolution. For the AQS, it 
is now 93.7% and 86.0% for the covariance and correlation 
methods, respectively. 

Let us first consider single age models (15 Gyr). In fig- 
ure 6 we show, for several values of r, the projection of 
the model spectra onto the principal plane defined by the 
galaxies in each of our two data sets. The model spectra 
overlap the spectral sequence for the two methods, although 
they tend to fall below the mean types for the correlation 
method. For each data set and normalization, the spectrum 
corresponding to a single instantaneous burst (t = 0, i.e., 
B&C model bc95-ssp-sp-l) falls just at the left extreme of 
the sequence. On the opposite extreme, the spectrum of a 
composite stellar population with constant star formation 
rate (t = oo, i.e., model bc95-cons-sp-l), a simple model for 
irregular galaxies, is near the mean locus of Sc galaxies in 
the principal plane. Other values of t are distributed along 
the sequence, between these two extremes. Figure 6 indi- 
cates that the parameter t provides a good parametrization 
of the spectral sequence, from elliptical to irregular galaxies 
(although this seems not to be quite true for the correlation 
method applied to the MQS, even in this case the results are 
satisfactory if one takes into account the simplicity of the 
models considered here). This result is in agreement with 
the conclusions of Kennicutt, Tamblyn & Congdon (1994), 
who have shown that the photometric properties of spiral 
galaxies along the Hubble sequence are predominantly due 
to changes in the star formation histories of disks, and only 
secondarily to changes in the bulge-to-disk ratio, which also 
varies systematically along the morphological sequence. 
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Figure 6. Projection of B&C model spectra for several values of the mean star formation time scale t and age equal to 15 Gyr onto 
the principal plane: (a) J\fQS, covariance method; (b) NQS, correlation method; (c) AQS, covariance method; (d) AQS, correlation 
method. Model spectra are represented by open squares. They correspond, from left to right, to t =0, 3, 5, 7, 10, 15 and oo Gyr (see 
text for details). Other symbols have the same meaning as in figure 3. In figures 6c and 6d, only the part of the principal plane occupied 
by the spectral sequence is shown. 
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We plot in figure 7 evolutive tracks for the r = and 
T — oo models, with galaxy ages running from to 20 Gyr. 
The evolutive track of an instantaneous burst, for the co- 
variance method, overlaps the spectral sequence, that is, 
its spectrum evolves from late to early-types along the se- 
quence. The same is partially true for the constant star for- 
mation rate model, but in this case the oldest spectrum falls 
near the centroid of normal Sc galaxies. Hence, in the princi- 
pal plane obtained with the covariance method, the spectral 
sequence not only characterizes the locus of normal galax- 
ies but is almost coincident with their evolutionary tracks. 
Figure 7 also shows that, for the correlation method, these 
tracks present more structure, but the same evolutive trend 
holds. It is worth pointing out that these tracks correspond 
to very simple models, and 'cosmic variance', e.g. differences 
in the epoch of galaxy formation, IMF, SFR, metallicity, 
dust content, etc., may well lead to models able to cover 
all the spectral scquciitx;. Note that these tracks do not ex- 
plain why some galaxies are out of the spectral sequence. 
Although most of them probably have a more complex star 
formation history than the simple exponential decreasing 
star formation rate considered here, the main reason is that 
B&C spectra do not include nebular emission produced by 
HII regions, and hence they are not adequate to describe 
galaxies with spectra dominated by ongoing star formation, 
like Mk59 or Mk71. 



5.3 On the nature of the Hubble sequence 

Our results lead us to conclude that the spectral sequence 
is also an evolutive sequence, with galaxy spectra evolv- 
ing from Magellanic irregulars to that of ellipticals. Con- 
sequently, we expect that the fraction of galaxies with late- 
type spectra should increase with rodsliift. It is well known 
that the fraction of blue objects increases with the red- 
shift, as evidenced originally by the Butcher-Oemler (1978) 
effect in clusters (see Rakos & Schombert 1995 for a re- 
cent study) or by the dramatic excess (with respect to no- 
evolution models) in galaxy counts in faint blue magnitudes 
(see Ellis 1990 for a review). Additionally, recent observa- 
tions with the HST have been able to show that these galax- 
ies are mainly late-type spirals and irregulars (Glazebrook 
et al. 1995; Driver, Windhorst & Grifiiths 1995). 

We have seen that there is a close correspondence be- 
tween the spectral and morphological sequences today, and 
that the spectral sequence is also an evolutive sequence. If 
wc do not live in a special epoch in the history of the Uni- 
verse, we may suppose that the spectral and morphological 
sequences have been always related. Then the Hubble se- 
quence itself would be an evolutionaxy sequence, with galax- 
ies evolving from Im to E! This is a speculation, of course, 
since our results refer to spectra and not to morphologies, 
and then the appearance of an E galax;y with age of 1 Gyr 
may be quite different of what is today an ~Sc galaxy. But is 
worth noting that some recent works present evidence that 
the morphology of a normal galaxy may evolve from late 
to early types. For instance, Pfenniger, Combes & Martinet 
(1994) and Pfenniger, Martinet & Combes (1996) axgue that 
this evolution may be driven by internal and external fac- 



tors due to the likely coupling between dynamics and star 
formation. The key point is that dynamical process that ac- 
tuate during a galaxy life- like formation and destruction 
of bars, mergers, close encounters, gas compression and/or 
stripping, etc.- tend to favour an increase of the spheroidal 
component at the expense of the disk, leading to a univo- 
cal sense of morphological evolution, from Sm to Sa. This 
sense of evolution also explains the morphological content 
of galaxy clusters, where most of galaxies are E or SO. The 
simplest explanation assumes that there is an infalling pop- 
ulation of late-type galaxies (Sodre et al. 1989, Kauffmann 
1995) that are transformed in E and SO (as well as in dwarf 
ellipticals) in the hostile environment of the clusters. Moore 
et al. (1996) have recently proposed that frequent encoun- 
ters at high speed among the galaxies in clusters ("galaxy 
harassment") may be the driver of morphological transfor- 
mations in these environments. This process may explain 
the Butcher-Oemler effect and the form of the blue objects 
observed in high-redshift clusters by the HST. Hierarchi- 
cal models also indicate that galaxy morphology may well 
change as consequence of mergers and interactions (Baugh, 
Cole & Prenk 1996), although these models do not have yet 
enough resolution to address the question of evolution within 
spiral types. This scenario also suggests that the first galax- 
ies should look more like faint gas rich irregular objects- in 
agreement with the inhomogeneous galaxy formation models 
of Baron & White (1987)- than the presumed bright precur- 
sors of today's elliptical galaxies and bulges of spirals in the 
scenario devised by Eggen, Lynden-Bell & Sandage (1962). 

Note that our results are consistent with this 'nurture' 
scenario, but they do not exclude by any means the 'nature' 
framework, where the morphology of a galaxy is already im- 
printed in the initial conditions at the moment of birth, with 
the galaxy spectrum evolving passively with time, without 
any significant morphological evolution. We hope that HST 
observations may be able to constrain the amount of mor- 
phological evolution of high redshift galaxies and help solv- 
ing the long standing problem of the origin of the Hubble 
sequence. 



6 CONCLUSIONS 

Wc have applied PCA to the study of integrated spectra of 
galaxies. Wc have found useful to analyse the projected dis- 
tribution of galaxy spectra onto the principal plane, defined 
as the plane in the data space that contains most of the data 
variance. Wc have shown that normal galaxies delineate a 
spectral sequence, which is related to the Hubble morpholog- 
ical sequence, i.e., the spectrum of normal galaxies changes 
smoothly from E to SO to Sa, up to Sc and Im, without a gap 
between ellipticals and spirals. The spectral sequence in the 
principal plane is somewhat analogous to the main sequence 
in the HR diagram. The existence of the spectral sequence 
is independent of the data scaling and provides quantitative 
support for the Hubble morphological sequence. 

The fact that normal galaxies form a one-dimensional 
spectral sequence may indicate that a single parameter con- 
trols their integrated spectra (and the Hubble sequence). 
With Bruzual & Chariot models we have shown that the 
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Figure 7. Projection of B&C evolutive tracks for t = (continuous line) and t = oo (dotted line) onto the principal plane: (a) NQS, 
covariance method; (b) NQS, correlation method; (c) AQS, covariance method; (d) AQS, correlation method. Symbols have the same 
meaning as in figure 3. The age (in Gyr) is indicated next to the tracks. Note that the track for t = completely overlaps the track 
for r = oo in the covariance method (7a and 7c). 
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characteristic star formation time scale of galaxies provides a 
good parametrization of the spectral sequence. Additionally, 
the evolutionary tracks of normal galaxies, when projected 
onto the principal plane defined by the spectra of presently 
normal galaxies, overlap completely (for E galaxies) or par- 
tially (for late-type galaxies) the spectral sequence. This 
means that galaxy spectra evolve along the sequence, at least 
for the simple evolutionary models discussed here. Taking 
into account the close correspondence between the spectral 
and morphological sequences, our results arc consistent with 
the hypothesis that galEixies may also evolve morphologically 
along the Hubble sequence, from Sm/Im to E. 

It is worth noting that PC A is dependent of the data set, 
since the analysis is done over tiic covariance or correlation 
matrix of the data. As we have shown, however, our main 
results are robust in the sense that they are, in a large extent, 
independent of data scaling or spectral resolution. It will 
be interesting to repeat this analysis when richer samples, 
including high redshift galaxies, become publically available. 
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